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1 Introduction 

Let J-" be a set of real-valued functions on a set X and let 5 : J-" —t- ^ be an arbitrary mapping. We 
consider the problem of making inference about S{f), with f £ J- unknown, from a finite set of point- 
wise evaluations of /. We are mainly interested in the problems of approximation and optimization. 
^ Formally, a deterministic algorithm to infer a quantity of interest S{f) from a set of n evaluations of 
O ' / is a pair , consisting of a deterministic search strategy 
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Z„ : / ^ KM) = (Mf), X2if),..., Xn{f)) G 
and a mapping Sn ■ J~ ^ G , such that: 
a) Xi{f) = xi, for some arbitrary xi G X 



• b) For all 1 < i < n, depends measurably on Ti{f), where Zi = {{Xi, Z\) , . . . , (Xj, Zi))^ and 

2 : Z,{f) = fiXiif)), l<i<n. 

There exists a measurable function (pn such that Sn = 4>n°Zn. 
^ ■ The algorithm describes a sequence of decisions, made from an increasing amount of infor- 



mation: for each i = l,...,n — 1, the algorithm uses information Ti{f) to choose the next evaluation 



00 ■ point The estimator Sn{f) of S{f) is the terminal decision. We shall denote by An the class 

of all strategies X^ that query sequentially n evaluations of / and also define the subclass A^^ C An 
of non-adaptive strategies, that is, the class of all strategies such that the Xis do not depend on /. 

A classical approach to study the performance of a sequential strategy is to consider the worst 
error of estimation on some class of functions J- 

■ I 

^ . eworstcase(Zn) ■= SUp L{S{f), Sn{f)) , 



where L is a loss function. There are many results dealing with the problems of function approximation 
and optimization in the worst case setting. Two noticeable results concern convex and symmetric 
classes of bounded functions. For such classes, from a worst-case point of view, any strategy will 
behave similarly for the problem of global optimization and that of function approximati on. Moreover 
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the use of adaptive methods can not be justified by a worst case analysis (see, e.g., I Novak 
Propositions 1.3.2 and 1.3.3). These results, combined with the fact that most optimization algorithms 
are adaptive, lead to think that the worst-case setting may not be the most appropriate framework 
to assess the performance of a search algorithm in practice. Indeed, it would be also important, in 
practice, to know whether the loss L{S{f), Sn{f)) is close to, or on the contrary much smaller than 
eworstcasc) for "typical" functions f £ T not corresponding to worst cases. To address this question, a 
classical approach is to adopt a Bayesian point of view. 

In this paper, we consider methods where / is seen as a sample path of a real- valued random 
process ^ defined on some probability space Pq) with parameter in X. Then, is a random 



sequence in X, with the property that Xn+i{C) is measurable with respect to the fx-algebra generated 
by Prom a Bayesian decision-theoretic point of view, the random process 

represents prior knowledge about / and makes it possible to infer a quantity of interest before eval- 
uating the function. This point of view has been widely explored in the domain of optimization and 
computer experiments. Under this setting, the performance of a given strategy Xn assessed by 

studying the average loss 

eavcragc(Z„) := E L{S{0 , Sn{0) ■ 

How much does adaption help on the average, and is it possible to derive rates of decay for errors in 
average? In this article, we shall make a brief review of results concerning average error bounds of 
Bayesian search methods based on a random process prior. 

This article has three parts. The precise assumptions about ^ are given in Section[21 SectionOdeals 
with the problem of function approximation, while Section U] deals with the problem of optimization. 



2 Framework 



Let ^ be a random process defined on a probability space B, Pq), with parameter x G M'^. Assume 
moreover that ^ has a zero mean and a continuous covariance function. The kriging predictor of S,{x), 
based on the observations i = 1, . . . ,n, is the orthogonal projection 
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of .^(rE) onto span{^(Xj(,^)), i = 1, . . . , n} in L'^{Q, B, Pq). At step n > 1, given evaluation points X^{C), 
the kriging coefficients X^ (x; X„(^)) can be obtained by solving a system of linear equations (see, e.g.. 
Chiles and Delfinerl . Il999l ). Note that for any sample path / = ^(w, • ), w G the value (,niuj, x) is a 
function of only. 

The mean-square error (MSE) of estimation at a fixed point x G M'^ will be denoted by 
a'^ix) := Emx)-i{x;X^{0)f}. 
It is generally not possible to compute when Xn is an adaptive strategy. 

Regularity assumptions. Assume that there exists <1> : M"' — t- M such that k{x,y) = <^{x — y), which 
is in L^(R"') and has a Fourier transform 



l>(n) = (27r)-'^/2 / $(a;)e^(^'")dx 
that satisfies 

(2) ci(l + MlY' < l>(n) < C2(l + ||n||^)- 



u G 



with s > d /2 and const ants < ci < C2. Note that the Matern covariance with regularity parameter 
(see, e.g., ISteinl . Il999l ) satisfies such a regularity assumption, wi th s = u_\_d/2. Tensor-product 



covariance functions, however, never satisfy such a condition (see iRitteij . l200d . chapter 7, for some 
results in this case). 

Let T-i be the RKHS of functions generated by k. Denote by ( • , • )y the i nner product of 'H, and 
by II • 11"^ the corresponding norm. It is well known (see, e.g. IWendlandl . l2005l ) that % is the Sobolev 
space 

Wm") = {/ G L\m.% /( • )(1 + II • 
due to the following result. 
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Proposition 1. "H C L'^{R'^) and V/ G Ti, 
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1^ is equivalent to the Sobolev norm 
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3 Approximation 

We first consider the problem of approximation, with the point of view exposed in Section [2j Using 
the notations introduced above, the problem of approximation corresponds to considering operators 
S and Sn defined by S'(^) := ^ |x and Sn{£,) ■= ^n|X) with X C M*^ a compact domain with non- 
empty interior. For the design of computer experiments, classical criteria for assessing the quality of 
a strategy Xn ^ •^n for the approximation problem are the maximum mean-square error (MMSE) 



euMssiKn) ■= sup E ((^(x) - = sup cr.^( 



and the integrated mean-square error (IMSE) 



eiMSE(Z„) := E(||C-Cn|li2(x,^)) = j^(Jn{xf li{dx) 



(see, e.g. , ISacks et al.l . 1 19891 : ICurrin et al.l . Il99ll : I Welch et al.l . Il992l : ISantner et al.l . l2003l ) . These criteria 
correspond to G-optimality and /-optimality in the theory of (parametric) optimal design. 

As mentioned earlier, computing (T^(x) is usually not possible in the case of adaptive sampling 
strategies, even for a Gaussian process. From a theoretical point of view, however, it is important to 
know if adaptive strategies can improve upon non-adaptive strategies for the approximation problem. 

Proposition 2. Assume that ^ is a Gaussian process. Then adaptivity does not help for the approxi- 
mation problem, with respect to either the MMSE or the IMSE criterion. 

Proof. For any adaptive strategy X^, it can be proved by induction (using the fact that Xj+i only 
depends on Ij) that, for each x E X, 



(3) 



aUx) = E[a'{x;X,{0,...,Xn{0)) , 



where C7^(x; xi, . . . , Xn), xi, . . . , x„ € X, denotes the MSE at x of the non-adaptive strategy that selects 
the points xi, . . . , Therefore, for each x G X, 

al{x) > min cj^(x; xi, . . . , x„) , 

which proves the claim in the case of the MMSE criterion. Similarly, integrating ([3]) yields 

J^aldfi = E^a2(x;X„,(e))/i(dx)| 

> min / cj^(x; xi, . . . , Xn) //(dx) , 

xi, gX 

which proves the claim in the case of the IMSE criterion. □ 



In the case of the IMSE crite rion, Proposi tion [2] can be seen as a special case of a general result 



about linear problems (see, e.g., iRitterl . 120001 . Chapter 7). The following proposition establishes a 
connection between the MMSE criterion and the worst-case L°°-error of approximation in the unit 
ball of %, which will be useful to establish the optimal rate for IMSE- and MMSE-optimal designs. 

Proposition 3. Let Hi denote the unit hall ofH. For any non-adaptive strategy Xn ^ -^n' MMSE 
criterion equals the squared worst-case -error of approximation in Tii using S^; 

eMMSE(^„) = sup ||5'(/) - 5„(/)||ioo(x) . 

Proof. Let Xn ^ -^n be a non-adaptive strategy such that Xj(^) = Xi, i = 1, . . . ,n, for some arbi- 
trary XiS in X. Denote by Xi{x) = Aj(a:;; X„(^)) the corresponding kriging coefficients (which do not 
depend on (^). Using the fact that the mapping ^(x) i— >■ k{x, • ) extends linearly to an isometry from 
span{^(y), y € M'^} to Ti, we have for all x G X 

= \\k{x, •) - ^.X'ix) k{xi, •)\\^ 
= sup ( /, fc(x, •) - V.A*(x)/c(2;i, •) 

= SWp {f - Snf){x) . 
f&Hi 

Thus, 

sup<Tn(a;) = sup sup (/ - 5n/)(a;) = sup ||/ - Sn/L, 



n 



□ 

The following proposition summarizes known results concerning the optimal rate of decay in the 
class of non-adaptive strategies for both the IMSE criterion and the MMSE criterion. Note that, by 
Proposition [21 this rate is also the optimal rate of decay in the class of all adaptive strategies if ^ is a 
Gaussian process. 

Proposition 4. Assume that has a continuous covariance function satisfying the regularity assump- 
tions of Section\^ and let v = s — d/2 > 0. Then there exists Ci > such that, for any Xn ^ -^n; 

(4) < eiMSE(^„) < Ai(X)eMMSE(X„) 

Moreover, if X has a Lipschitz boundary and satisfies an interior cone condition, then there exists 
C2 > such that 



(5) inf ^ e,MSE(^n) < K^) mf Emmse (Z„ ) < C2 n-^-^/"^ 

The optimal rate of decay is therefore 7i~2i//d hoth criteria. 



Proof. It is proved in (jRitterl . I2OOOI . Chapter 7, Proposition 8) that there exists Ci > such that 
eiMSE(=^„,) > Ci ?i~2i'/<^ in the case where X = [0; 1]"^. This readily proves the lower bound ([H) since 
any X with non-empty interior contains an hypercube on which Ritter's result holds. 



If X is a bounded Lipschitz domain satisfying an interior cone condition, then (jNarcowich et al 

2005 . Proposition 3.2) there exists ci > such that ||5'(/) — Sn{f)\\L°°(X) ^ cihn '^^^ II'S'(/)||h/|(x) for 



all / G where hn = supa,gx™™iG{i....,n}l|3J~-'^j(/)l|2 is the fill distance of the non-adaptive strategy 
X„ in X. Therefore 



\S{f) -Sn{f)\\L^(X) < ClK\\S{f)\\w^(^X) < 



W' 



< c2K\\f\\n 



for some C2 > 0, using the equivalence of the Sobolev VFKM'^) norm with the RKHS norm (see 
Section [2]). Considering any non-adaptive space-filling strategy X-n with a fill distance hn = 0{n~^/'^) 
yields 



inf sup \\f - Snf\\r^,^^ < csn 



-u/d 



for some C3 > and the upper-bound (0) then follows from Proposition El 



□ 



Finding a non-adaptive MMSE-optimal design is a difficult non-convex optimization problem in 
nd dimensions. Instead of addressing directly such a high-dimensional global optimization problem, 
we can use the classical sequential non-adaptive greedy strategy • ) = (xi, . . . € defined 
by 



(6) 



Xj+i = argmax a"^ {x; xi, . . . , Xj) , 1 <i <n . 



Of course, the strategy is suboptimal but it only involves simpler optimiz ation problems in d dimensions 
and has the advantage that it can be stopped at any time. Following iBinev et al.l pOlO ). it can be 
established that this greedy strategy is rate optimal. 

Proposition 5. Assume that ^ has a continuous covariance function satisfying the regularity as- 
sumptions of Section\^ and let u = s — d/2 > 0. Let X^ be the sequential strategy defined by 
Then, 



Proof. Theorem 3.1 in lBinev et al.l ( 2010 ). applied to the compact subset {^(x), x G X} in B, Pq), 

states that the greedy algorithm Q preserves polynomial rates of decay. The result follows from 
Proposition [H □ 



4 Optimization 

In this section, we consider the problem of global optimization on a compact domain X C M'^, 
which corresponds formally to operators S and Sn defined by 5(^) = sup^-gx '^(2^) and Sn{S,) = 
maxigi,...,„^(Xi(^)). 

In a Bayesian setting, a classical criterion to assess the performance of an optimization procedure 
is the average error 

eo-T(Xj:=E(5(e)-5„(6). 

Although it may be not possible in the context of this article to make a comprehensive review of 
known results concerning the average case in the Gaussian case, it can be safely said however that 
such results are scarce and specific. 

In fact, most available results about the average- case er r or con cern the one-dimensional Wiener 



process ^ on the interval [0, 1] . Under this setting, iRitterl (|l99Cll ) shows that the average error of 



the best non-adaptive optimization procedure decreases at rate n (extensions of this result for 



non-adaptive algorithms and the r-fold Wi ener measure can be found in IWasilkowskil . Il992l ) . Under 
the same assumptions for ^, ICalvinI (|l997l ) derives the exact limiting distribution of the error of a 
particular adaptive algorithm, which suggests that adaptivity does yield a better average error for the 
optimization problem — the result is that, for any < (5 < 1, it is possible to find an adaptive strategy 
such that n(i-'^)(5(0 - Sn{0) 

converges in distribution. 
A theoretical result concerning the optimal average-error criterion for less restrictive Gaussian pri- 
ors is also available. If the covariance of a Gaussian process ^ is g-Holder continuous, then Griinewalder 
et al. (|2nid ) show that a space filling strategy Xn achieves 



(7) 



eo-TUJ = 0(n-/(2'^)(logn)V2) 



Thus, under the assumptions of Section [51 for a Matern covariance with regularity parameter v, the 
rate of the optimal average error of estimation of the optimum is less than n~''/'^(log n)^/^ (since a 
Matern covariance is a-Holder continuous with a = 2v). Note that this bound is not sharp in general 
since the optimal non-adaptive rate is for the Brownian motion on [0; 1], the covariance function 

of which is Q-Holder continuous with a = 1. 

In view of these results, we can safely say that characterizing the average behavior of adaptive 
sequential optimization algorithms is still an open (and apparently difficult) problem. At present, the 
only way to draw useful conclusions about the interest of a particular optimiz ation algorithm is to 
resort to numerical simulations. Empirical studies such as the one presented in lBenassi et al.l ()201lh 
for instance are therefore very useful from a practical point of view, since they make it possible to 
obtain fine and sound performance assessments of any strategy with a reasonable computational cost. 
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